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Abstract 

This paper reviews the connections between Graphplan's planning-graph and the dynamic 
constraint satisfaction problem and motivates the need for adapting CSP search techniques to the 
Graphplan algorithm. It then describes how explanation based learning, dependency directed back- 
tracking, dynamic variable ordering, forward checking, sticky values and random-restart search 
strategies can be adapted to Graphplan. Empirical results are provided to demonstrate that these 
augmentations improve Graphplan's performance significantly (up to lOOOx speedups)on several 
benchmark problems. Special attention is paid to the explanation-based learning and dependency 
directed backtracking techniques as they are empirically found to be most useful in improving the 
performance of Graphplan. 



1. Introduction 

Graphplan (Blum & Furst, 1997) is currently one of the more efficient algorithms for solving clas- 
sical planning problems. Four of the five competing systems in the recent AIPS-98 planning com- 
petition were based on the Graphplan algorithm (McDermott, 1998). Extending the efficiency of 
the Graphplan algorithm thus seems to be a worth-while activity. In (Kambhampati, Parker, & 
Lambrecht, 1997), we provided a reconstruction of Graphplan algorithm to explicate its links to 
previous work in classical planning and constraint satisfaction. One specific link that was discussed 
is the connection between the process of searching Graphplan's planning graph, and solving a "dy- 
namic constraint satisfaction problem" (DCSP) (Mittal & Falkenhainer, 1990). Seen from the DCSP 
perspective, the standard backward search proposed by Blum and Furst (1997) lacks a variety of in- 
gredients that are thought to make up efficient CSP search mechanisms (Frost & Dechter, 1994; 
Bayardo & Schrag, 1997). These include forward checking, dynamic variable ordering, depen- 
dency directed backtracking and explanation-based learning (Tsang, 1993; Kambhampati, 1998). 
In (Kambhampati et al., 1997), I have suggested that it would be beneficial to study the impact of 
these extensions on the effectiveness of Graphplan's backward search. 

In this paper, I describe my experiences with adding a variety of CSP search techniques to im- 
prove Graphplan backward search-including explanation-based learning (EBL) and dependency- 
directed backtracking capabilities (DDB), Dynamic variable ordering, Forward checking, sticky 
values, and random-restart search strategies. Of these, the addition of EBL and DDB capabilities 
turned out to be empirically the most useful. Both EBL and DDB are based on explaining failures 
at the leaf-nodes of a search tree, and propagating those explanations upwards through the search 
tree (Kambhampati, 1998). DDB involves using the propagation of failure explanations to support 
intelligent backtracking, while EBL involves storing interior-node failure explanations, for pruning 
future search nodes. Graphplan does use a weak form of failure-driven learning that it calls "mem- 
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oization." As we shall see in this paper, Graphplan's brand of learning is quite limited as there is 
no explicit analysis of the reasons for failure. Instead the explanation of failure of a search node is 
taken to be all the constraints in that search node. As explained in (Kambhampati, 1998), this not 
only eliminates the opportunities for dependency directed backtracking, it also adversely effects the 
utility of the stored memos. 

Adding full-fledged EBL and DDB capabilities in effect gives Graphplan both the ability to 
do intelligent backtracking, and the ability to learn generalized memos that are more likely to be 
applicable in other situations. Technically, this involves generalizing conflict-directed backjumping 
(Prosser, 1993), a specialized version of EBL/DDB strategy applicable for binary CSP problems 1 
to work in the context of dynamic constraint satisfaction problems (as discussed in (Kambham- 
pati, 1998)). Empirically, the EBL/DDB capabilities improve Graphplan's search efficiency quite 
dramatically-giving rise to up to lOOOx speedups, and allowing Graphplan to easily solve several 
problems that have hither-to been hard or unsolvable. In particular, I will report on my experiments 
with the bench-mark problems described by Kautz and Selman (1996), as well as 4 other domains, 
some of which were used in the recent AIPS planning competition (McDermott, 1998). 

I discuss the utility issues involved in storing and using memos, and point out that the Graphplan 
memoization strategy can be seen as a very conservative form of CSP no-good learning. While this 
conservative strategy keeps the storage and retrieval costs of no-goods -the usual bane of no-good 
learning strategies-under control, it also loses some learning opportunities. I then present the use 
of "sticky values" as a way of recouping some of these losses. Empirical studies show that sticky 
values lead to a further 2-4x improvement over EBL. 

In addition to EBL and DDB, I also investigated the utility of forward checking and dynamic 
variable ordering, both in isolation and in concert with EBL and DDB. My empirical studies show 
that these capabilities typically lead to an additional 2-4x speedup over EBL/DDB, but are not by 
themselves competitive with EBL/DDB. 

Finally, I consider the utility of the EBL/DDB strategies in the context of random-restart search 
strategies (Gomes, Selman, & Kautz, 1998) that have recently been shown to be good at solv- 
ing hard combinatorial problems, including planning problems. My results show that EBL/DDB 
strategies retain their advantages even in the context of such random-restart strategies. Specifically, 
EBL/DDB strategies enable Graphplan to use the backtrack limits more effectively-allowing it to 
achieve higher solvability rates, and more optimal plans with significantly smaller backtrack and 
restart limits. 

This paper is organized as follows. In the next section, I provide some background on viewing 
Graphplan's backward search as a (dynamic) constraint satisfaction problem, and review some of 
the opportunities this view presents. In Section 3, 1 discuss some inefficiencies of the backtracking 
and learning methods used in normal Graphplan that motivate the need for EBL/DDB capabilities. 
Section 4 describes how EBL and DDB are added to Graphplan. Section 5 presents empirical studies 
demonstrating the usefulness of these augmentations. Section 7 investigates the utility of forward 
checking and dynamic variable ordering strategies for Graphplan. Section 8 investigates the utility 
of EBL/DDB strategies in the context of random-restart search. Section 9 discusses related work 
and Section 10 presents conclusions and some directions for further work. 

1. Binary CSP problems are those problems where all initial constraints are between pairs of variables. 
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Variables: G lt ■ ■ ■ ,G 4 , P 1 ■ ■ ■ P 6 

Domains: Gj :{A 1 },G 2 : { A 2 }G 3 : { A 3 }G 4 : { A 4 } 

Pi: {A 5 }P 2 : {A 6 , A^yPs: {A 7 }P 4 : {A 8 , Ag} 
Pe-- {A w }P 6 -- {A 10 } 

Constraints (normal): Pi = A5 ^> P4 7^ Ag 
Pi = A 6 P 4 / ,4s 
P 2 = An P 3 ^ A 7 

Constraints (Activity): Gi = A\ => Active{Pi, P2,P 3 } 
G 2 = A 2 => Active{P 4 } 
G3 = A% => Active{P$} 
G 4 — A 4 Active{P 1 , P 6 } 

Init State: Active{Gi, G 2 , G 3 , G 4 } 

(a) Planning Graph (b) DCSP 

Figure 1: A planning graph and the DCSP corresponding to it 

2. Review of Graphplan Algorithm and its Connections to DCSP 
2.1 Review of Graphplan Algorithm 

Graphplan algorithm (Blum & Furst, 1997) can be seen as a "disjunctive" version of the forward 
state space planners (Kambhampati et al., 1997; Kambhampati, 1997). It consists of two interleaved 
phases - a forward phase, where a data structure called "planning-graph" is incrementally extended, 
and a backward phase where the planning-graph is searched to extract a valid plan. The planning- 
graph consists of two alternating structures, called proposition lists and action lists. Figure 1 shows 
a partial planning-graph structure. We start with the initial state as the zeroth level proposition list. 
Given a k level planning graph, the extension of structure to level k + 1 involves introducing all 
actions whose preconditions are present in the k th level proposition list. In addition to the actions 
given in the domain model, we consider a set of dummy "persist" actions, one for each condition 
in the k th level proposition list. A "persist-C" action has C as its precondition and C as its effect. 
Once the actions are introduced, the proposition list at level k + 1 is constructed as just the union of 
the effects of all the introduced actions. Planning-graph maintains the dependency links between the 
actions at level k + 1 and their preconditions in level k proposition list and their effects in level k + 1 
proposition list. The planning-graph construction also involves computation and propagation of 
"mutex" constraints. The propagation starts at level 1, with the actions that are statically interfering 
with each other (i.e., their preconditions and effects are inconsistent) labeled mutex. Mutexes are 
then propagated from this level forward by using a two simple rules: two propositions at level k are 
marked mutex if all actions at level k that support one proposition are mutex with all actions that 
support the second proposition. Two actions at level k + 1 are mutex if they are statically interfering 
or if one of the propositions (preconditions) supporting the first action is mutually exclusive with 
one of the propositions supporting the second action. 

The search phase on a k level planning-graph involves checking to see if there is a sub-graph 
of the planning-graph that corresponds to a valid solution to the problem. This involves starting 
with the propositions corresponding to goals at level k (if all the goals are not present, or if they are 
present but a pair of them are marked mutually exclusive, the search is abandoned right away, and 
planning-grap is grown another level). For each of the goal propositions, we then select an action 
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Variables: G 1 , ■ ■ ■ , G 4 , Pi ■ ■ ■ P e Variables: Gi , ■ ■ ■ , G 4 , Pi ■ ■ ■ P e 

Domains: Gi : { Aj }, G 2 : { A 2 }G 3 : { A 3 }G 4 : { A 4 } Domains: Gj : {Aj , X} , G 2 : {A 2 , X}G 3 : {A 3 , _L}G 4 : {A 4 , X} 

Pi:{A 5 }P 2 :{A 6 ,Aii}P 3 :{A 7 }P 4 :{A 8 ,Ag} Pi : { A 5 , X }P 2 : { A 6 , Aj i , X} P 3 : { A 7 , X }P 4 : { A s , Ag , X} 

P&-- {A 10 }P e : {A 10 } P 5 :{A 10 ,±}P 6 :{A 10 , X} 

Constraints (normal): Pi = A5 => P4 7^ Ag Constraints (normal): Pi = A5 => P 4 7^ Ag 

^2 = A 6 ^ P 4 ^ A s P 2 = A 6 =s- P 4 ^ A 8 

P 2 = An P 3 ^ A 7 P 2 = A„ =s- P 3 ^ A 7 

Constraints (Activity): Gi = Aj => Active{P-i, P 2 , P 3 } Constraints (Activity): Gi = A x => Pi ^X AP 2 ^X AP 3 ^X 

G 2 = A 2 => Acitue{P 4 } G 2 = A 2 => P 4 ^X 

G 3 = A 3 => Ac«ue{P 5 } G 3 =A 3 ^P 57 iX 

G 4 = A 4 =s- ActiveiP-i, P e } G 4 = A 4 => Pi ^X AP 6 ^X 

InitState: Active{G x , G 2 , G 3 , G 4 } InitState:Gi ^1 Ae 2 #1 AC3 #1 AG 4 ^1 

(a) DCSP (b) CSP 

Figure 2: Compiling a DCSP to a standard CSP 

from the level k action list that supports it, such that no two actions selected for supporting two 
different goals are mutually exclusive (if they are, we backtrack and try to change the selection of 
actions). At this point, we recursively call the same search process on the k — 1 level planning-graph, 
with the preconditions of the actions selected at level k as the goals for the k — 1 level search. The 
search succeeds when we reach level (corresponding to the initial state). 

Consider the (partial) planning graph shown in Figure 3 that Graphplan may have generated 
and is about to search for a solution. G\ ■ ■ ■ G4 are the top level goals that we want to satisfy, 
and Ai ■ ■ ■ A^ are the actions that support these goals in the planning graph. The specific action- 
precondition dependencies are shown by the straight line connections. The actions A5 ■ ■ ■ An at the 
left-most level support the conditions Pi • • • in the planning-graph. Notice that the conditions P2 
and P4 at level k — 1 are supported by two actions each. The x-marked connections between the 
actions A$, Ag, A^, Ag and A?, An denote that those action pairs are mutually exclusive. (Notice 
that given these mutually exclusive relations alone, Graphplan cannot derive any mutual exclusion 
relations at the proposition level P\ ■ ■ ■ Pq.) 

2.2 Connections Between Graphplan and CSP 

The Graphplan algorithm as described above bears little resemblance to previous classical planning 
algorithms. In (Kambhampati et al., 1997), we explicate a number of important links between 
the Graphplan algorithm and previous work in planning and constraint satisfaction communities. 
Specifically, I show that a planning-graph of length k can be thought of (to a first approximation) as a 
disjunctive (unioned) version of a A;-level search tree generated by a forward state-space refinement, 
with the action lists corresponding to the union of all actions appearing at k th level, and proposition 
lists corresponding to the union of all states appearing at the k th level. The mutex constraints 
can be seen as providing (partial) information about which subsets of a proposition list actually 
correspond to legal states in the corresponding forward state-space search. The process of searching 
the planning graph to extract a valid plan from it can be seen as a dynamic constraint satisfaction 
problem. Since this last link is most relevant to the work described in this paper, I will review it 
further below. 

The dynamic constraint satisfaction problem (DCSP) (Mittal & Falkenhainer, 1990) is a gener- 
alization of the constraint satisfaction problem (Tsang, 1993), that is specified by a set of variables, 
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activity flags for the variables, the domains of the variables, and the constraints on the legal variable- 
value combinations. In a DCSP, initially only a subset of the variables is active, and the objective is 
to find assignments for all active variables that is consistent with the constraints among those vari- 
ables. In addition, the DCSP specification also contains a set of "activity constraints." An activity 
constraint is of the form: "if variable x takes on the value v x , then the variables y, z, w... become 
active." 

The correspondence between the planning-graph and the DCSP should now be clear. Specifi- 
cally, the propositions at various levels correspond to the DCSP variables 2 , and the actions support- 
ing them correspond to the variable domains. There are three types of constraints: action mutex 
constraints, fact (proposition) mutex constraints and subgoal activation constraints. 

Since actions are modeled as values rather than variables, action mutex constraints have to be 
modeled indirectly as constraints between propositions. If two actions a\ and a 2 are marked mutex 
with each other in the planning graph, then for every pair of propositions p n and p 12 where a\ is 
one of the possible supporting actions for pn and a 2 is one of the possible supporting actions for 
P12, we have the constraint: 

-. (p u = ai Api2 = a 2 ) 

Fact mutex constraints are modeled as constraints that prohibit the simultaneous activation of 
the two facts. Specifically, if two propositions p n and p 12 are marked mutex in the planning graph, 
we have the constraint: 

-i (Active(pn) A Active(p\2)) 

Subgoal activation constraints are implicitly specified by action preconditions: supporting an 
active proposition p with an action a makes all the propositions in the previous level corresponding 
to the preconditions of a active. 

Finally, only the propositions corresponding to the goals of the problem are "active" in the be- 
ginning. Figure 1 shows the dynamic constraint satisfaction problem corresponding to the example 
planning-graph that we discussed. 

2.2.1 Solving a DCSP 

There are two ways of solving a DCSP problem. The first, direct, approach (Mittal & Falkenhainer, 
1990) involves starting with the initially active variables, and finding a satisfying assignment for 
them. This assignment may activate some new variables, and these newly activated variables are 
assigned in the second epoch. This process continues until we reach an epoch where no more new 
variables are activated (which implies success), or we are unable to give a satisfying assignment to 
the activated variables at a given epoch. In this latter case, we backtrack to the previous epoch and 
try to find an alternative satisfying assignment to those variables (backtracking further, if no other 
assignment is possible). The backward search process used by the Graphplan algorithm (Blum & 
Furst, 1997) can be seen as solving the DCSP corresponding to the planning graph in this direct 
fashion. 

The second approach for solving a DCSP is to first compile it into a standard CSP, and use 
the standard CSP algorithms. This compilation process is quite straightforward and is illustrated in 

2. Note that the same literal appearing in different levels corresponds to different DCSP variables. Thus, strictly speak- 
ing, a literal p in the proposition list at level i is converted into a DCSP variable pi . To keep matters simple, the 
example in Figure 1 contains syntactically different literals in different levels of the graph. 
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Figure 2. The main idea is to introduce a new "null" value (denoted by "_L") into the domains of 
each of the DCSP variables. We then model an inactive DCSP variable as a CSP variable which 
takes the value _L. The constraint that a particular variable P be active is modeled as P ^_L. Thus, 
activity constraint of the form 

d = A 1 Active{P u P 2 , P 3 } 
is compiled to the standard CSP constraint 

G x = A x Pi AP 2 ?± AP 3 ^-L 

It is worth noting here that the activation constraints above are only concerned about ensuring 
that propositions that are preconditions of a selected action do take non-_L values. They thus allow 
for the possibility that propositions can become active (take non-_L values) even though they are 
strictly not supporting preconditions of any selected action. Although this can lead to inoptimal 
plans, the mutex constraints ensure that no unsound plans will be produced (Kautz & Selman, 
1999). To avoid unnecessary activation of variables, we need to add constraints to the effect that 
unless one of the actions needing that variable as a precondition has been selected as the value for 
some variable in the earlier (higher) level, the variable must have _L value. Such constraints are 
typically going to have very high arity (as they wind up mentioning a large number of variables in 
the previous level), and may thus be harder to handle during search. 

Finally, a mutex constraint between two propositions 

-i (Active(pn) A Active(p\2)) 

is compiled into 

Since action mutex constraints are already in the standard CSP form, with this compilation, all 
the activity constraints are converted into standard constraints and thus the entire CSP is now a 
standard CSP. It can now be solved by any of the standard CSP search techniques (Tsang, 1993). 3 

The direct method has the advantage that it closely mirrors the Graphplan's planning graph 
structure and its backward search. Because of this, it is possible to implement the approach on the 
plan graph structure without explicitly representing all the constraints. Furthermore, as I will dis- 
cuss in Section 6, there are some distinct advantages for adopting the DCSP view in implementing 
EBL/DDB on Graphplan. The compilation to CSP requires that plan graph be first converted into 
an extensional CSP. It does however allow the use of standard algorithms, as well as supports non- 
directional search (in that one does not have to follow the epoch-by-epoch approach in assigning 
variables). 4 Since my main aim is to illustrate the utility of CSP search techniques in the context of 
the Graphplan algorithm, I will adopt the direct solution method for the DCSP. For a study of the 
tradeoffs offered by the technique of compiling the planning graph into a CSP, the reader is referred 
to (Do & Kambhampati, 2000). 

3. It is also possible to compile any CSP problem to a prepositional satisfiability problem (i.e., a CSP problem with 
boolean variables). This is accomplished by compiling every CSP variable P that has the domain {v\ , V2 , • • • , v n } 
into n boolean variables of the form P-is-Vi ■ ■ P-is-v n . Every constraint of the form P = Vj A •••=>••• is compiled 
to P-is-VjA •••=>•••. This is essentially what is done by the BLACKBOX system (Kautz & Selman, 1999). 

4. Compilation to CSP is not a strict requirement for doing non-directional search. In (Zimmerman & Kambhampati, 
1999), we describe a technique that allows the backward search of Graphplan to be non-directional, see the discussion 
in Section 10. 
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2.3 Interpreting Mutex Propagation from the CSP View 

Viewing the planning graph as a constraint satisfaction problem helps put the mutex propagation 
in a clearer perspective (see (Kambhampati et al., 1997)). Specifically, the way Graphplan con- 
structs its planning graph, it winds up enforcing partial directed 1 -consistency and partial directed 
2-consistency (Tsang, 1993). The partial 1-consistency is ensured by the graph building procedure 
which introduces an action at level I only if the actions preconditions are present in the proposition 
list at level I — 1 and are not mutually exclusive. Partial 2-consistency is ensured by the mutual 
exclusion propagation procedure. 

In particular, the Graphplan planning graph construction implicitly derives "no-good" 5 con- 
straints of the form: 

-^Active(P^) (orP l m ^) 

In which case P % m will be simply removed from (or will not be put into) the level i, and the mutex 
constraints of the form: 

-n (Active(P l m ) A Active^)) (or P l m AP* ^_L) 

in which case P ? l ra and P' l n are marked mutually exclusive. 

Both procedures are "directed" in that they only use "reachability" analysis in enforcing the con- 
sistency, and are "partial" in that they do not enforce either full 1-consistency or full 2-consistency. 
Lack of full 1-consistency is verified by the fact that the appearance of a goal at level k does not 
necessarily mean that the goal is actually achievable by level k (i.e., there is a solution for the CSP 
that assigns a non- _L value to that goal at that level). Similarly, lack of full 2-consistency is veri- 
fied by the fact that appearance of a pair of goals at level k does not imply that there is a plan for 
achieving both goals by that level. 

There is another, somewhat less obvious, way in which the consistency enforcement used in 
Graphplan is partial (and very conservative)-it concentrates only on whether a single goal variable 
or a pair of goal variables can simultaneously have non- _L values (be active) in a solution. It may 
be that a goal can have a non- _L value, but not all non- _L values are feasible. Similarly, it may be 
that a pair of goals are achievable, but not necessarily achievable with every possible pair of actions 
in their respective domains. 

This interpretation of mutex propagation procedure in Graphplan brings to fore several possible 
extensions worth considering for Graphplan: 

1. Explore the utility of directional consistency enforcement procedures that are not based solely 
on reachability analysis. Kambhampati et. al. (1997) argue for extending this analysis using 
relevance information, and Do et. al. (2000) provide an empirical analysis of the effectiveness 
of consistency enforcement through relevance information. 

2. Explore the utility of enforcing higher level consistency. As pointed out in (Kambhampati 
et al., 1997; Kambhampati, 1998), the memoization strategies can be seen as failure-driven 
procedures that incrementally enforce partial higher level consistency. 

5. Normally, in the CSP literature, a no-good is seen as a compound assignment that can not be part of any feasible 
solution. With this view, mutex constraints of the form AP^ y£_L correspond to a conjunction of nogoods of 

the the form P l m = a u A P^ = a v where a u and a v are values in the domains of P l m and P l n . 
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3. Consider relaxing the focus on non- _L values alone, and allow derivation of no-goods of the 
form 

This is not guaranteed to be a winning idea as the number of derived no-goods can increase 
quite dramatically. In particular, assuming that there are / levels in the planning graph, and an 
average of m goals per level, and an average of d actions supporting each goal, the maximum 
number of Graphplan style pair-wise mutexes will be 0(1 * m?) while the 2-size no-goods of 
type discussed here will be 0(1 * (m * (d + l)) 2 ). We consider a similar issue in the context 
of Graphplan memoization strategy in Section 6. 

3. Some Inefficiencies of Graphplan's Backward Search 

To motivate the need for EBL and DDB, we shall first review the details of Graphplan's backward 
search, and pinpoint some of its inefficiencies. We shall base our discussion on the example planning 
graph from Figure 3 (which is reproduced for convenience from Figure 1). Assuming that G\ ■ ■ ■ G4 
are the top level goals of the problem we are interested in solving, we start at level k, and select 
actions to support the goals G\ ■ ■ ■ G4. To keep matters simple, we shall assume that the search 
assigns the conditions (variables) at each level from top to bottom (i.e., G\ first, then G2 and so 
on). Further, when there is a choice in the actions (values) that can support a condition, we will 
consider the top actions first. Since there is only one choice for each of the conditions at this level, 
and none of the actions are mutually exclusive with each other, we select the actions A\,A2, A3 
and A4 for supporting the conditions at level k. We now have to make sure that the preconditions 
of A\,A2, A%,A4 are satisfied at level k — 1. We thus subgoal on the conditions Pi ■ ■ ■ P 6 at level 
k — 1, and recursively start the action selection for them. We select the action A5 for Pi. For P 2 , 
we have two supporting actions, and using our convention, we select first. For P 3 , A7 is the 
only choice. When we get down to selecting a support for P4, we again have a choice. Suppose 
we select Ag first. We find that this choice is infeasible as Ag is mutually exclusive with A$ that is 
already chosen. So, we backtrack and choose Ag, and find that it too is mutually exclusive with a 
previously selected action, A5. We now are stymied as there are no other choices for P4. So, we 
have to backtrack and undo choices for the previous conditions. Graphplan uses a chronological 
backtracking approach, whereby, it first tries to see if P 3 can be re-assigned, and then P 2 and so on. 
Notice the first indication of inefficiency here - the failure to assign P4 had nothing to do with the 
assignment for P 3 , and yet, chronological backtracking will try to re-assign P 3 in the vain hope of 
averting the failure. This can lead to a large amount of wasted effort had it been the case that P3 did 
indeed have other choices. 

As it turns out, we find that P3 has no other choices and backtrack over it. P 2 does have another 
choice - An. We try to continue the search forward with this value for P 2 , but hit an impasse at P 3 - 
since the only value of P 3 , A7 is mutex with An. At this point, we backtrack over P 3 , and continue 
backtracking over P 2 and Pi , as they too have no other remaining choices. When we backtrack over 
Pi , we need to go back to level k and try to re-assign the goals at that level. Before this is done, the 
Graphplan search algorithm makes a "memo" signifying the fact that it failed to satisfy the goals 
Pi • • • P6 at this level, with the hope that if the search ever subgoals on these same set of goals in 
future, we can scuttle it right away with the help of the remembered memo. Here is the second 
indication of inefficiency - we are remembering all the subgoals Pi • • • P$ even though we can see 
that the problem lies in trying to assign Pi , P 2 , P 3 and P 4 simultaneously, and has nothing to do 
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Action list Proposition list Action list Proposition List 
Level k-1 Level k-1 Level k Level k 




Figure 3: The running example used to illustrate EBL/DDB in Graphplan 

with the other subgoals. If we remember {Pi, P 2 , P3, P4} as the memo as against {Pi ■ ■ ■ P 6 }, the 
remembered memo would be more general, and would have a much better chance of being useful 
in the future. 

After the memo is stored, the backtracking continues into level k - once again in a chronological 
fashion, trying to reassign G4, G3, G2 and Gi in that order. Here we see the third indication of inef- 
ficiency caused by chronological backtracking - G3 really has no role in the failure we encountered 
in assigning P 3 and P4 - since it only spawns the condition P 5 at level k — 1. Yet, the backtracking 
scheme of Graphplan considers reassigning G3. A somewhat more subtle point is that reassigning 
G4 is not going to avert the failure either. Although G4 requires Pi one of the conditions taking 
part in the failure, Pi is also required by Gi and unless Gi gets reassigned, considering further 
assignments to G4 is not going to avert the failure. 

For this example, we continue backtracking over G2 and Gi too, since they too have no alterna- 
tive supports, and finally memoize {Gi, G2, G3, G4} at this level. At this point the backward search 
fails, and Graphplan extends the planning graph by another level before re-initiating the backward 
search on the extended graph. 

4. Improving Backward Search with EBL and DDB 

I will now describe how Graphplan's backward search can be augmented with full fledged EBL 
and DDB capabilities to eliminate the inefficiencies pointed out in the previous section. Informally, 
EBL/DDB strategies involve explanation of failures at leaf nodes, and regression and propagation 
of leaf node failure explanations to compute interior node failure explanations, along the lines de- 
scribed in (Kambhampati, 1998). The specific extensions I propose to the backward search can 
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essentially be seen as adapting conflict-directed backjumping strategy (Prosser, 1993), and general- 
izing it to work with dynamic constraint satisfaction problems. 

The algorithm is shown in pseudo-code form in Figure 4. It contains two mutually recursive 
procedures find-plan and assign-goals. The former is called once for each level of the 
planning-graph. It then calls as s ign - goal s to assign values to all the required conditions at that 
level, assign -goals picks a condition, selects a value for it, and recursively calls itself with 
the remaining conditions. When it is invoked with empty set of conditions to be assigned, it calls 
find-plan to initiate the search at the next (previous) level. 

In order to illustrate how EBL/DDB capabilities are added, let's retrace the previous example, 
and pick up at the point where we are about to assign P4 at level k — 1, having assigned Pi, P 2 and 
P3. When we try to assign the value Ag to P4, we violate the mutex constraint between and Ag. 
An explanation of failure for a search node is a set of constraints from which False can be derived. 
The complete explanation for this failure can thus be stated as: 

P 2 = A 6 AP 4 = A 8 A (P 2 = A 6 ^P 4 ^ A 8 ) 

Of this, the part P 2 = Aq =4> P4 ^ Ag can be stripped from the explanation since the mutual 
exclusion relation will hold as long as we are solving this particular problem with these particular 
actions. Further, we can take a cue from the conflict directed backjumping algorithm (Prosser, 
1993), and represent the remaining explanation compactly in terms of "conflict sets." Specifically, 
whenever the search reaches a condition c (and is about to find an assignment for it), its conflict 
set is initialized as {c}. Whenever one of the possible assignments to c is inconsistent (mutually 
exclusive) with the current assignment of a previous variable c', we add c' to the conflict set of c. In 
the current example, we start with {P4} as the conflict set of P4, and expand it by adding P 2 after 
we find that Ag cannot be assigned to P4 because of the choice of A$ to support P 2 . Informally, 
the conflict set representation can be seen as an incrementally maintained (partial) explanation of 
failure, indicating that there is a conflict between the current value of P 2 and one of the possible 
values of P4 (Kambhampati, 1998). 

We now consider the second possible value of P4, viz., Ag, and find that it is mutually exclusive 
with ^5 which is currently supporting P\ . Following our practice, we add Pi to the conflict set of 
P4. At this point, there are no further choices for P4, and so we backtrack from P4, passing the 
conflict set of P4, viz., {Pi,P2,P4} as the reason for its failure. In essence, the conflict set is a 
shorthand notation for the following complete failure explanation (Kambhampati, 1998): 6 

[(P 4 = Ag)\j{P4 = Aq)] A (Pi = A 5 P 4 + A 9 )A(P 2 = A 6 P 4 / Ag)AP 1 = A 5 AP 2 = A 6 

It is worth noting at this point that when P4 is revisited in the future with different assignments 
to the preceding variables, its conflict set will be re-initialized to {P4} before considering any as- 
signments to it. 

The first advantage of the conflict set is that it allows a transparent way of supporting depen- 
dency directed backtracking (Kambhampati, 1998). In the current example, having failed to assign 
P4, we have to start backtracking. We do not need to do this in a chronological fashion however. 

6. We strip the first (disjunctive) clause since it is present in the graph structure, and the next two implicative clauses 
since they are part of the mutual exclusion relations that will not change for this problem. The conflict set represen- 
tation just keeps the condition (variable) names of the last two clauses - denoting, in essence, that it is the current 
assignments of the variables Pi and P 2 that are causing the failure to assign P 4 . 
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Find-Plan(G:goals, pg: plan graph, k: level) 

If k = 0, Return an empty subplan P with success. 
If there is a memo M such that M C G, 

Fail, and return M as the conflict set 
Call Assign-goals(G,pg, k,$). 

If Assign-goals fails and returns a conflict set M, 
Store M as a memo 

Regress M over actions selected at level k + 1 to get R 
Fail and return R as the conflict set 
If Assign-goals succeeds, and returns a &-level subplan P, 
Return P with success 



Assign-goals(G:goals, pg: plan graph, k: level, A: actions) 

IfG = 

Let U be the union of preconditions of the actions in A 
Call Find-plan(U,pg, k - 1) 

If Find-plan fails and returns a conflict set R, 

Fail and return R 
If Find-plan succeeds and returns a subplan P of length k — 1 
Succeed and return a k length subplan P • A 

Else ;;(G / 0) 

Select a goal g E G 

Let cs -5— {</}, and j4 9 be the set of actions from level k in pg that support g 
LI: If A s = 0, Fail and return cs as the conflict set 

Else, pick an action a E A g , and set A g ^ A g — a 

If a is mutually exclusive with some action b G A 
Let / be the goal that b was selected to support 
Set cs <- cs U {/} 
Goto LI 

Else (a is not mutually exclusive with any action in A) 
Call Assign-goals(G — {g},pg, k, A U {a}) 
If the call fails and returns a conflict set C 
IfgGC 

Set cs = cs Li C ;conflict set absorption 
Goto LI 
Else ;fo C) 

Fail and return C as the conflict set 

dependency directed backjumping 



Figure 4: A pseudo-code description of Graphplan backward search enhanced with EBL/DDB ca- 
pabilities. The backward search at level A; in a planning-graph pg is initiated with the call 
Find-Plan(G,pg, k), where G is the set of top level goals of the problem. 
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Instead, we jump back to the most recent variable (condition) taking part in the conflict set of P4 - 
in this case P 2 . By doing so, we are avoiding considering other alternatives at P 3 , and thus avoiding 
one of the inefficiencies of the standard backward search. It is easy to see that such backjumping is 
sound since P 3 is not causing the failure at P4 and thus re-assigning it won't avert the failure. 

Continuing along, whenever the search backtracks to a condition c, the backtrack conflict is 
absorbed into the current conflict set of c. In our example, we absorb {Pi, P 2 , P4} into the conflict 
set of P 2 , which is currently {P 2 } (making {Pi, P 2 , P 4 } the new conflict set of P 2 ). We now assign 
An, the on ly remaining value, to P 2 . Next we try to assign P 3 and find that its only value A7 is 
mutex with An. Thus, we set conflict set of P3 to be {P 3 , P 2 } and backtrack with this conflict 
set. When the backtracking reaches P 2 , this conflict set is absorbed into the current conflict set of 
P 2 (as described earlier), giving rise to {Pi, P 2 , P3, P4} as the current combined failure reason for 
P 2 . This step illustrates how the conflict set of a condition is incrementally expanded to collect the 
reasons for failure of the various possible values of the condition. 

At this point, P 2 has no further choices, so we backtrack over P 2 with its current conflict set, 
{Pi, P 2 , P 3 , P 4 }. At Pi, we first absorb the conflict set {Pi, P 2 , P 3 , P 4 } into Pi's current conflict 
set, and then re-initiate backtracking since Pi has no further choices. 

Now, we have reached the end of the current level (A; — 1). Any backtracking over Pi must 
involve undoing assignments of the conditions at the k th level. Before we do that however, we do 
two steps: memoization and regression. 

4.1 Memoization 

Before we backtrack over the first assigned variable at a given level, we store the conflict set of that 
variable as a memo at that level. We store the conflict set {Pi, P 2 , P 3 , P4} of Pi as a memo at this 
level. Notice that the memo we store is shorter (and thus more general) than the one stored by the 
normal Graphplan, as we do not include P 5 and P 6 , which did not have anything to do with the 
failure 7 

4.2 Regression 

Before we backtrack out of level k — 1 to level k, we need to convert the conflict set of (the first 
assigned variable in) level k — 1 so that it refers to the conditions in level k. This conversion 
process involves regressing the conflict set over the actions selected at the k th level (Kambhampati, 
1998). In essence, the regression step computes the (smallest) set of conditions (variables) at the 
k th level whose supporting actions spawned (activated, in DCSP terms) the conditions (variables) 
in the conflict set at level k — 1. In the current case, our conflict set is {Pi, P 2 , P 3 , P4}. We can 
see that P 2 , P 3 are required because of the condition Gi at level k, and the condition P 4 is required 
because of the condition G 2 . 

In the case of condition Pi, both Gi and G4 are responsible for it, as both their supporting 
actions needed Pi. In such cases we have two heuristics for computing the regression: (1) Prefer 
choices that help the conflict set to regress to a smaller set of conditions (2) If we still have a choice 
between multiple conditions at level k, we pick the one that has been assigned earlier. The motiva- 
tion for the first rule is to keep the failure explanations as compact (and thus as general) as possible, 

7. While in the current example, the memo includes all the conditions up to P4 (which is the farthest we have gone in 
this level), even this is not always necessary. We can verify that P 3 would not have been in the memo set if An were 
not one of the supporters of P 2 . 
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and the motivation for the second rale is to support deeper dependency directed backtracking. It 
is important to note that these heuristics are aimed at improving the performance of the EBL/DDB 
and do not affect the soundness and completeness of the approach. 

In the current example, the first of these rales applies, since Pi is already required by G\, which 
is also requiring P2 and P 3 . Even if this was not the case (i.e., G\ only required Pi), we still would 
have selected G\ over G4 as the regression of Pi, since G\ was assigned earlier in the search. 

The result of regressing {Pi , P 2 , P 3 , P 4 } over the actions at k th level is thus {G\, G^}- We start 
backtracking at level k with this as the conflict set. We jump back to G2 right away, since it is the 
most recent variable named in the conflict set. This avoids the inefficiency of re-considering the 
choices at G3 and G4, as done by the normal backward search. At G2, the backtrack conflict set 
is absorbed, and the backtracking continues since there are no other choices. Same procedure is 
repeated at G\. At this point, we are once again at the end of a level-and we memoize {G\, G2} 
as the memo at level k. Since there are no other levels to backtrack to, Graphplan is called on to 
extend the planning-graph by one more level. 

Notice that the memos based on EBL analysis capture failures that may require a significant 
amount of search to rediscover. In our example, we are able to discover that {G\, G2} is a failing 
goal set despite the fact that there are no mutex relations between the choices of the goals G\ and 
G 2 . 

4.3 Using the Memos 

Before we end this section, there are a couple of observations regarding the use of the stored memos. 
In the standard Graphplan, memos at each level are stored in a level-specific hash table. Whenever 
backward search reaches a level k with a set of conditions to be satisfied, it consults the hash table 
to see if this exact set of conditions is stored as a memo. Search is terminated only if an exact hit 
occurs. Since EBL analysis allows us to store compact memos, it is not likely that a complete goal 
set at some level k is going to exactly match a stored memo. What is more likely is that a stored 
memo is a subset of the goal set at level k (which is sufficient to declare that goal set a failure). 
In other words, the memo checking routine in Graphplan needs to be modified so that it checks to 
see if some subset of the current goal set is stored as a memo. The naive way of doing it - which 
involves enumerating all the subsets of the current goal set and checking if any of them are in the 
hash table, turns out to be very costly. One needs more efficient data structures, such as the set- 
enumeration trees (Rymon, 1992). Indeed, Koehler and her co-workers (Koehler, Nebel, Hoffman, 
& Dimopoulos, 1997) have developed a data structure called UB -Trees for storing the memos. The 
UB-Tree structures can be seen as a specialized version of the "set-enumeration trees," and they can 
efficiently check if any subset of the current goal set has been stored as a memo. 

The second observation regarding memos is that they can often serve as a failure explanation 
in themselves. Suppose we are at some level k, and find that the goal set at this level subsumes 
some stored memo M. We can then use M as the failure explanation for this level, and regress it 
back to the previous level. Such a process can provide us with valuable opportunities for further 
back jumping at levels above k. It also allows us to learn new compact memos at those levels. Note 
that none of this would have been possible with normal memos stored by Graphplan, as the only 
way a memo can declare a goal set at level k as failing is if the memo is exactly equal to the goal 
set. In such a case regression will just get us all the goals at level k + 1, and does not buy us any 
backjumping or learning power (Kambhampati, 1998). 
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5. Empirical Evaluation of the Effectiveness of EBL/DDB 

We have now seen the way EBL and DDB capabilities are added to the backward search by main- 
taining and updating conflict-sets. We also noted that EBL and DDB capabilities avoid a variety 
of inefficiencies in the standard Graphplan backward search. That these augmentations are sound- 
ness and completeness preserving follows from the corresponding properties of conflict-directed 
backjumping (Kambhampati, 1998). The remaining (million-dollar) question is whether these ca- 
pabilities make a difference in practice. I now present a set of empirical results to answer this 
question. 

I implemented the EBL/DDB approach described in the previous section on top of a Graphplan 
implementation in Lisp. 8 The changes needed to the code to add EBL/DDB capability were rel- 
atively minor - only two functions needed non-trivial changes 9 . I also added the UB-Tree subset 
memo checking code described in (Koehler et al, 1997). I then ran several comparative experiments 
on the "benchmark" problems from (Kautz & Selman, 1996), as well as from four other domains. 
The specific domains included blocks world, rocket world, logistics domain, gripper domain, ferry 
domain, traveling salesperson domain, and towers of hanoi. Some of these domains, including the 
blocks world, the logistics domain and the gripper domain were used in the recent AI Planning 
Systems competition. The specifications of the problems as well as domains are publicly available. 

Table 1 shows the statistics on the times taken and number of backtracks made by normal Graph- 
plan, and Graphplan with EBL/DDB capabilities. 10 

5.1 Run-Time Improvement 

The first thing we note is that EBL/DDB techniques can offer quite dramatic speedups - from 1.6x 
in blocks world all the way to 120x in the logistics domain (the Att-log-a problem is unsolvable by 
normal Graphplan after over 40 hours of cpu time!). We also note that the number of backtracks 
reduces significantly and consistently with EBL/DDB. Given the lengh of some of the runs, the time 
Lisp spends doing garbage collection becomes an important issue. I thus report the cumulative time 
(including cpu time and garbage collection time) for Graphplan with EBL/DDB, while I separate 
the cpu time from cumulative time for the plain Graphplan (in cases where the total time spent 
was large enough that garbage collection time is a significant fraction). Specifically, there are two 
entrys in the column corresponding to total time for the normal Graphplan. The first entry is the 
cpu time spent, while the second entry in parenthesis is the cumulative time (cpu time and garbage 
collection time) spent. The speedup is computed with respect to the cumulative time of Graphplan 
with EBL/DDB and cpu time of plain Graphplan. 11 The reported speedups should thus be seen as 
conservative estimates. 

8. The original lisp implementation of Graphplan was done by Mark Peot. The implementation was subsequently 
improved by David Smith. 

9. Assign-goals and find-plan 

10. In the earlier versions of this paper, including the paper presented at IJCAI (Kambhampati, 1999) I have reported 
experiments on a Sun SPARC Ultra 1 running Allegro Common Lisp 4.3. The Linux machine run-time statistics 
seem to be approximately 2.7x faster than those from the Sparc machine. 

11. It is interesting to note that the percentage of time spent doing garbage collection is highly problem dependent. For 
example, in the case of Att-log-a, only 30 minutes out of 41 hours (or about 1% of the cumulative time) was spent 
doing garbage collection, while in the case of Tower-6, 3.1 hours out of 4.8 hours (or about 65% of the cumulative 
time) was spent on garbage collection! 
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Table 1: Empirical performance of EBL/DDB. Unless otherwise noted, times are in cpu minutes on a Pentium-Ill 500 MHZ machine with 
256meg RA running Linux and allegro common lisp 5, compiled for speed. "Tt" is total time, "Mt" is the time used in checking 
memos and "Btks" is the number of backtracks done during search. The times for Graphplan with EBL/DDB include both the cpu 
and garbage collection time, while the cpu time is separated from the total time in the case of normal Graphplan. The numbers in 
parentheses next to the problem names list the number of time steps and number of actions respectively in the solution. AvLn and 
AvFM denote the average memo length and average number of failures detected per stored memo respectively. 
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5.2 Reduction in Memo Length 

The results also highlight the fact that the speedups offered by EBL/DDB are problem/domain 
dependent - they are quite meager in blocks world problems, and are quite dramatic in many other 
domains including the rocket world, logistics, ferry, gripper, TSP and Hanoi domains. The statistics 
on the memos, shown in Table 1 shed light on the reasons for this variation. Of particular interest 
is the average length of the stored memos (given in the columns labeled "AvLn"). In general, we 
expect that the EBL analysis reduces the length of stored memos, as conditions that are not part of 
the failure explanation are not stored in the memo. However, the advantage of this depends on the 
likelihood that only a small subset of the goals at a given level are actually taking part in the failure. 
This likelihood in turn depends on the amount of inter-dependencies between the goals at a given 
level. From the table, we note that the average length reduces quite dramatically in the rocket world 
and logistics 12 , while the reduction is much less pronounced in the blocks world. This variation can 
be traced back to a larger degree of inter-dependency between goals at a given level in the blocks 
world problems. 

The reduction in average memo length is correlated perfectly with the speedups offered by EBL 
on the corresponding problems. Let me put this in perspective. The fact that the average length 
of memos for Rocket-ext-a problem is 8.5 with EBL and 24 without EBL, shows in essence that 
normal Graphplan is re-discovering an 8-sized failure embedded in ( 2 g 4 ) possible ways in the worst 
case in a 24 sized goal set - storing a new memo each time (incurring both increased backtracking 
and matching costs)! It is thus no wonder that normal Graphplan performs badly compared to 
Graphplan with EBL/DDB. 

5.3 Utility of Stored Memos 

The statistics in Table 1 also show the increased utility of the memos stored by Graphplan with 
EBL/DDB. Since EBL/DDB store more general (smaller) memos than normal Graphplan, they 
should, in theory, generate fewer memos and use them more often. The columns labeled "AvFM" 
give the ratio of the number of failures discovered through the use of memos to the number of memos 
generated in the first place. This can be seen as a measure of the average "utility" of the stored 
memos. We note that the utility is consistently higher with EBL/DDB. As an example, in Rocket- 
ext-b, we see that on the average an EBL/DDB generated memo was used to discover failures 101 
times, while the number was only 3.2 for the memos generated by the normal Graphplan. 13 

5.4 Relative Utility of EBL vs. DDB 

From the statistics in Table 1, we see that even though EBL can make significant improvements in 
run-time, a significant fraction of the run time with EBL (as well as normal Graphplan) is spent in 
memo checking. This raises the possibility that the overall savings are mostly from the DDB part 
and that the EBL part (i.e, the part involving storing and checking memos) is in fact a net drain 
(Kambhampati, Katukam, & Qu, 1997). To see if this is true, I ran some problems with EBL (i.e., 
memo-checking) disabled. The DDB capability as well as the standard Graphplan memoization 

12. For the case of Att-log-a, I took the memo statistics by interrupting the search after about 6 hours 

13. The statistics for Att-log-aseem to suggest that memo usage was not as bad for normal Graphplan. However, it should 
be noted that Att-log-a was not solved by normal Graphplan to begin with. The improved usage factor may be due 
mostly to the fact that the search went for a considerably longer time, giving Graphplan more opportunity to use its 
memos. 
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Table 2: Utility of storing and using EBL memos over just doing DDB 



strategies were left in. 14 The results are shown in Table 2, and demonstrate that the ability to store 
smaller memos (as afforded by EBL) is quite helpful-giving rise to 120x speedup over DDB alone 
in the Att-log-a problem, and 50x speedup in Tower-6 problem. Of course, the results also show that 
DDB is an important capability in itself. Indeed, Att-log-aand tower-6 could not even be solved by 
the standard Graphplan, while with DDB, these problems become solvable. In summary, the results 
show that both EBL and DDB can have a net positive utility. 

5.5 Utility of Memoization 

Another minor, but not well-recognized, point brought out by the statistics in Table 1 is that the 
memo checking can sometimes be a significant fraction of the run-time of standard Graphplan. For 
example, in the case of Rocket-ext-a, standard Graphplan takes 19.4 minutes of which 1 1 .7 minutes, 
or over half the time, is spent in memo checking (in hash tables) ! This raises the possibility that if 
we just disable the memoization, perhaps we can do just as well as the version with EBL/DDB. To 
see if this is the case, I ran some of the problems with memoization disabled. The results show that 
in general disabling memo-checking leads to worsened performance. While I came across some 
cases where the disablement reduces the overall run-time, the run-time is still much higher than 
what you get with EBL/DDB. As an example, in the case of Rocket-ext-a, if we disable the memo 
checking completely, Graphplan takes 16.5 minutes, which while lower than the 19.4 minutes taken 
by standard Graphplan, is still much higher than the .8 minutes taken by the version of Graphplan 
with EBL/DDB capabilities added. If we add DDB capability, while still disabling the memo- 
checking, the run time becomes 2.4 minutes, which is still 3 times higher than that afforded with 
EBL capability. 

5.6 The C vs. Lisp Question 

Given that most existing implementations of Graphplan are done in C with many optimizations, 
one nagging doubt is whether the dramatic speedups due to EBL/DDB are somehow dependent 
on the moderately optimized Lisp implementation I have used in my experiments. Thankfully, the 
EBL/DDB techniques described in this paper have also been (re)implemented by Maria Fox and 
Derek Long on their STAN system. STAN is a highly optimized implementation of Graphplan 
that fared well in the recent AIPS planning competition. They have found that EBL/DDB resulted 
in similar dramatic speedups on their system too (Fox, 1998; Fox & Long, 1999). For example, 

14. I also considered removing the memoization completely, but the results were even poorer. 
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they were unable to solve Att-log-a with plain Graphplan, but could solve it easily with EBL/DDB 
added. 

Finally, it is worth pointing out that even with EBL/DDB capabilities, I was unable to solve 
some larger problems in the AT&T benchmarks, such as bw-large-c and Att-log-b. This is however 
not an indictment against EBL/DDB since to my knowledge the only planners that solved these 
problems have all used either local search strategies such as GSAT, randomized re-start strategies, 
or have used additional domain-specific knowledge and pre-processing. At the very least, I am not 
aware of any existing implementations of Graphplan that solve these problems. 

6. On the Utility of Graphplan Memos 

One important issue in using EBL is managing the costs of storage and matching. Indeed, as dis- 
cussed in (Kambhampati, 1998), naive implementations of EBL/DDB are known to lose the gains 
made in pruning power in the matching and storage costs. Consequently, several techniques have 
been invented to reduce these costs through selective learning as well as selective forgetting. It is 
interesting to see why these costs have not been as prominent an issue for EBL/DDB on Graphplan. 
I think this is mostly because of two characteristics of Graphplan memoization strategy: 

1. Graphplan's memoization strategy provides a very compact representation for no-goods, as 
well as a very selective strategy for remembering no-goods. Seen as DCSP, it only remembers 
subsets of activated variables that do not have a satisfying assignment. Seen as a CSP (c.f. 
Figure 2), Graphplan only remembers no-goods of the form 

P{^± AP 2 ^L---P^L 

(where the superscripts correspond to the level of the planning graph to which the proposition 
belongs), while normal EBL implementations learn no-goods of the form 

Pi = fll A P£ = Q2 ' ' ' Pm = a m 

Suppose a planning graph contains n propositions divided into / levels, and each proposition 
P at level j has at most d actions supporting it. A CSP compilation of the planning graph will 
have n variables, each with d+ 1 values (the extra one for _L). A normal EBL implementation 
for such a CSP can learn, in the worst case, (d + 2) n no-goods. 15 In contrast, Graphplan 
remembers only I * 2~ memos 16 -a very dramatic reduction. This reduction is a result of two 
factors: 

(a) Each individual memo stored by Graphplan corresponds to an exponentially large set of 
normal no-goods (the memo 

P[?± AP 2 * /L-.-Pi/L 

is a shorthand notation for the conjunction of d m no-goods corresponding to all possible 
non- _L assignments to P[ ■ ■ ■ P^) 

15. Each variable v may either not be present in a no-good, or be present with one of d 4- 1 possible assignments-giving 
d + 1 possibilities for each of n variables. 

16. At each level, each of f propositions either occurs in a memo or does not occur 
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(b) Memos only subsume no-goods made up of proposition variables from the same plan- 
ning graph level. 

2. The matching cost is reduced by both the fact that considerably fewer no-goods are ever 
learned, and the fact that Graphplan stores no-goods (memos) separately for each level, and 
only consults the memos stored at level j, while doing backwards search at a level j, 

The above discussion throws some light on why the so-called "EBL utility" problem is not as 
critical for Graphplan as it is for EBL done on normal CSPs. 

6.1 Scenarios Where Memoization is too Conservative to Avoid Rediscovery of the Same 
Failures 

The discussion above also raises the possibility that Graphplan (even with EBL/DDB) memoization 
is too conservative and may be losing some useful learning opportunities only because they are not 
in the required syntactic form. Specifically, before Graphplan can learn a memo of the form 

it must be the case that each of the d m possible assignments to the m prepositional variables must 
be a no-good. Even if one of them is not a no-good, Graphplan avoids learning the memo, thus 
potentially repeating the failing searches at a later time (although the loss is made up to some extent 
by learning several memos at a lower level). 

Consider for example the following scenario: we have a set of variables P{ ■ ■ ■ P % m ■ ■ ■ P % n at some 
level i that are being assigned by backward search. Suppose the search has found a legal partial as- 
signment for the variables P[ ■ ■ ■ P^_ 1; and the domain of P^ contains the k values {vi ■ ■ ■ Vk}- In 
trying to assign the variables P^ ■ ■ ■ P£, suppose we repeatedly fail and backtrack up to the variable 
P^, re-assigning it and eventually settling at the value vj. At this point once again backtracking 
occurs, but this time we backtrack over P % m to higher level variables (P{ ■ ■ ■ P^J and re-assigning 
them. At this point, it would have been useful to remember some no-goods to the effect that none 
of the first 6 values of P % m are going to work so all that backtracking does not have to be repeated. 
Such no-goods will take the form: 

PL = A P % m+l y£± AP^ +2 ?± ■ ■ ■ P^± 

where j ranges over 1 • • • 6, for all the values of P % m that were tried and found to lead to failure while 
assigning the later variables. Unfortunately, such no-goods are not in the syntactic form of memos 
and so the memoization procedure cannot remember them. The search is thus forced to rediscover 
the same failures. 

6.2 Sticky Values as a Partial Antidote 

One way of staying with the standard memoization, but avoiding rediscovery of the failing search 
paths, such as those in the case of the example above, is to use the "sticky values" heuristic (Frost 
& Dechter, 1994; Kambhampati, 1998). This involves remembering the current value of a variable 
while skipping over it during DDB, and trying that value first when the search comes back to that 
variable. The heuristic is motivated by the fact that when we skip over a variable during DDB, it 
means that the variable and its current assignment have not contributed to the failure that caused 
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the backtracking-so it makes sense to restore this value upon re-visit. In the example above, this 
heuristic will remember that vj was the current value of P % m when we backtracked over it, and tries 
that as the first value when it is re-visited. A variation on this technique is to re-arrange or fold the 
domain of the variable such that all the values that precede the current value are sent to the back 
of the domain, so that these values will be tried only if other previously untried values are found to 
fail. This makes the assumption that the values that led to failure once are likely to do so again. In 
the example above, this heuristic folds the domain of P^ so it becomes {^7, vg ■ ■ ■ Vk,v\,V2 ■ ■ ■ vq}. 
Notice that both these heuristics make sense only if we employ DDB, as otherwise we will never 
skip over any variable during backtracking. 

I implemented both sticky value heuristics on top of EBL/DDB for Graphplan. The statistics 
in Table 3 show the results of experiments with this extension. As can be seen, the sticky values 
approach is able to give up to 4.6x additional speedup over EBL/DDB depending on the problem. 
Further, while the folding heuristic dominates the simple version in terms of number of backtracks, 
the difference is quite small in terms of run-time. 

7. Forward Checking & Dynamic Variable Ordering 

DDB and EBL are considered "look-back" techniques in that they analyze the failures by looking 
back at the past variables that may have played a part in those failures. There is a different class 
of techniques known as "look-forward" techniques for improving search. Prominent among these 
latter are forward checking and dynamic variable ordering. Supporting forward checking involves 
filtering out the conflicting actions from the domains of the remaining goals, as soon as a particular 
goal is assigned. In the example in Figure 1, forward checking will filter Ag from the domain of P4 
as soon as Pi is assigned A5. Dynamic variable ordering (DVO) involves selecting for assignment 
the goal that has the least number of remaining establishes. 17 When DVO is combined with for- 
ward checking, the variables are ordered according to their "live" domain sizes (where live domain 
is comprised of values from the domain that are not yet pruned by forward checking). Our experi- 
ments 18 show that these techniques can bring about reasonable, albeit non-dramatic, improvements 
in Graphplan's performance. Table 4 shows the statistics for some benchmark problems, with dy- 
namic variable ordering alone, and with forward checking and dynamic variable ordering. We note 
that while the backtracks reduce by up to 3.6x in the case of dynamic variable ordering, and 5x in the 
case of dynamic variable ordering and forward checking, the speedups in time are somewhat smaller, 
ranging only from l.lx to 4.8x. Times can perhaps be improved further with a more efficient imple- 
mentation of forward checking. 19 The results also seem to suggest that no amount of optimization 
is going to make dynamic variable ordering and forward checking competitive with EBL/DDB on 
other problems. For one thing, there are several problems, including Att-log-a, Tsp-12, Ferry-6 etc. 
which just could not be solved even with forward checking and dynamic variable ordering. Second, 
even on the problems that could be solved, the reduction in backtracks provided by EBL/DDB is far 
greater than that provided by FC/DVO strategies. For example, on Tsp-10, the FC/DVO strategies 

17. I have also experimented with a variation of this heuristic, known as the Brelaz heuristic (Gomes et al., 1998), where 
the ties among variables with the same sized live-domains are broken by picking variables that take part in most 
number of constraints. This variation did not however lead to any appreciable improvement in performance. 

18. The study of forward checking and dynamic variable ordering was initiated with Dan Weld. 

19. My current implementation physically removes the pruned values of a variable during forward checking phase, and 
restores values on backtracks. There are better implementations, including use of in/out flags on the values as well as 
use of indexed arrays (c.f. (Bacchus & van Run, 1995)) 
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Problem 


Plain EBL/DDB 


EBL/DDB+Sticky 


EBL/DDB+Sticky+Fold 




Time 


Btks 


Time 


Btks 


Speedup 


Time 


Btks 


Speedup 


Rocket-ext-a(7/36) 


.8 


764K 


.37 


372K 


2.2x(2.05x) 


.33 


347K 


2.4x (2.2x) 


Rocket-ext-b(7/36) 


.8 


569K 


.18 


172K 


4.6x(3.3x) 


.177 


169K 


4.5x(3.36x) 


Gripper- 10(39/39) 


47.95 


61373K 


36.9 


56212K 


1.29x(1.09x) 


40.8 


54975K 


1.17x(1.12x) 


Ferry-6 


11.62 


18318K 


11.75 


18151K 


.99x(1.01x) 


11.87 


18151K 


.97x(1.01x) 


TSP-12(12/12) 


12.44 


21482K 


9.86 


20948K 


1.26x(1.02x) 


10.18 


20948K 


1.22x(1.02x) 


Att-log-a( 11/79) 


1.95 


2186K 


.95 


1144K 


2x(1.91x) 


.67 


781K 


2.9x(2.8x) 



C/5 



Table 3: Utility 



of using sticky values along with EBL/DDB. 
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J. l^f^l^f 1DJV ) 


1. JX{ZX) 




.oO{jX) 


Rocket-ext-a (7/36) 


19.43(8128K) 


14.9(5252K) 


1.3x(1.5x) 


14.5(1877K) 


1.3x(4.3x) 


Rocket-ext-b (7/36) 


14.1(10434K) 


7.91(4382K) 


1.8x(2.4x) 


6(1490K) 


2.4x(7x) 


Att-log-a( 11/79) 


>10hr 


>10hr 




>10hr. 




Gripper-6(11/17) 


1.1(2802K) 


.65(1 107K) 


1.7x(2.5x) 


.73 (740K) 


1.5x(3.7x) 


Tsp-10(10/10) 


89(69974K) 


78(37654K) 


1.14x(1.9x) 


81(14697K) 


1.09x(4.8x) 


Tower-6(63/63) 


>10hr 


>10hr 




>10hr. 





Table 4: Impact of forward checking and dynamic variable ordering routines on Graphplan. Times 
are in cpu minutes as measured on a 500 MHZ Pentium-Ill running Linux and Franz 
Allegro Common Lisp 5. The numbers in parentheses next to times are the number of 
backtracks. The speedup columns report two factors-the first is the speedup in time, and 
the second is the speedup in terms of number of backtracks. While FC and DVO tend to 
reduce the number of backtracks, the reduction does not always seem to show up in the 
time savings. 



reduce number of backtracks from 69974K to 14697K, a 4.8x improvement. However, this pales 
in comparison to 2232K backtracks (or 31x improvement) given by by EBL/DDB (see the entry in 
Table 1). Notice that these results only say that variable ordering strategies do not make a dramatic 
difference for Graphplan's backward search (or a DCSP compilation of the planning graph); they do 
not make any claims about the utility of FC and DVO for a CSP compilation of the planning graph. 

7.1 Complementing EBL/DDB with Forward Checking and Dynamic Variable Ordering 

Although forward checking and dynamic variable ordering approaches were not found to be partic- 
ularly effective in isolation for Graphplan's backward search, I thought that it would be interesting 
to revisit them in the context of a Graphplan enhanced with EBL/DDB strategies. Part of the orig- 
inal reasoning underlying the expectation that goal (variable) ordering will not have a significant 
effect on Graphplan performance is based on the fact that all the failing goal sets are stored in-toto 
as memos (Blum & Furst, 1997, pp. 290). This reason no longer holds when we use EBL/DDB. 
Further more, there exists some difference of opinion as to whether or not forward checking and 
DDB can fruitfully co-exist. The results of (Prosser, 1993) suggest that domain-filtering-such as 
the one afforded by forward checking, degrades intelligent backtracking. The more recent work 
(Frost & Dechter, 1994; Bayardo & Schrag, 1997) however seems to suggest however that best CSP 
algorithms should have both capabilities. 

While adding plain DVO capability on top of EBL/DDB presents no difficulties, adding forward 
checking does require some changes to the algorithm in Figure 4. The difficulty arises because a 
failure may have occurred as a combined effect of the forward checking and backtracking. For 
example, suppose we have four variables v\ ■ ■ ■ i>4 that are being considered for assignment in that 
order. Suppose v% has the domain {1, 2, 3}, and u 3 cannot be 1 if v\ is a, and cannot be 2 if is 
b. Suppose further that V4S domain only contains d, and there is a constraint saying that V4 can't 
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Problem 


EBL 

Time(btks) 


EBL+DVO 


EBL+FC+DVO 


Time(btks) 


Speedup 


Time(Btks) 


Speedup 


Huge-fct 


3.08(2004K) 


1.51(745K) 


2x(2.68x) 


2.57(404K) 


1.2x(5x) 


BW-Large-B 


2.27(798K) 


1.81(514K) 


1.25x(1.6x) 


2.98(333K) 


.76x(2.39x) 


Rocket-ext-a 


.8(764K) 


.4(242K) 


2x(3.2x) 


.73(273K) 


1.09x(2.8x) 


Rocket-ext-b 


.8(569K) 


.29(151K) 


2.75x(3.76x) 


.72(195K) 


l.lx(2.9x) 


Att-log-a 


1.97(2186K) 


2.59(1 109K) 


.76x(1.97x) 


3.98(1 134K) 


.5x(1.92x) 


Tower-6 


2.53(4098K) 


3.78(3396K) 


.67x(1.2x) 


2.09(636K) 


1.2x(6.4x) 


TSP-10 


.99(2232K) 


1.27(1793K) 


.77x(1.24x) 


1.34(828K) 


.73x(2.7x) 



Table 5: Effect of complementing EBL/DDB with dynamic variable ordering and forward checking 
strategies. The speedup columns report two factors-the first is the speedup in time, and 
the second is the speedup in terms of number of backtracks. While FC and DVO tend to 
reduce the number of backtracks, the reduction does not always seem to show up in the 
time savings. 



be d if v\ is a and is 3. Suppose we are using forward checking, and have assigned v\,V2 the 
values a and b. Forward checking prunes 1 and 2 from w 3 's domain, leaving only the value 3. At 
this point, we try to assign V4 and fail. If we use the algorithm in Figure 4, the conflict set for V4 
would be {^4, V3,vi}, as the constraint that is violated is v\ = a A v$ = 3 A V4 = d. However this 
is not sufficient since the failure at V4 may not have occurred if forward checking had not stripped 
the value 2 from the domain of U3. This problem can be handled by pushing v\ and V2, the variables 
whose assignment stripped some values from w 3 , into v^s conflict set. 20 Specifically, the conflict 
set of every variable v is initialized to {v} to begin with, and whenever v loses a value during 
forward checking with respect to the assignment of v', v' is added to the conflict set of v. Whenever 
a future variable (such as V4) conflicts with v%, we add the conflict set of v% (rather than just v%) to 
the conflict set of V4. Specifically the line 

"Set cs = cs U { I }" 

in the procedure in Figure 4 is replaced with the line 

"Set cs = cs U Conflict-set(0" 

I have incorporated the above changes into my implementation, so it can support support for- 
ward checking, dynamic variable ordering as well as EBL on Graphplan. Table 5 shows the perfor- 
mance of this version on the experimental test suite. As can be seen from the numbers, the number 
of backtracks are reduced by up to 3.7x in the case of EBL+DVO, and up to 5x in the case of 
EBL+FC+DVO. The cpu time improvements are somewhat lower. While we got up to 2.7x speedup 

20. Notice that it is possible that the values that were stripped off from w 3 's domain may not have had any impact on the 
failure to assign V4. For example, perhaps there is another constraint that says that V4 can't be d if V3 is b, and in 
that case, strictly speaking, the assignment of vi cannot really be blamed for the failure at V4 . While this leads to 
non-minimal explanations, there is no reason to expect that strict minimization of explanations is a pre-requisite for 
the effectiveness of EBL/DDB; see (Kambhampati, 1998) 
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with EBL+DVO, and up to 1.2x speedup with EBL+FC+DVO, in several cases, the cpu times in- 
crease with FC and DVO. Once again, I attribute this to the overheads of forward checking (and to 
a lesser extent, of dynamic variable ordering). Most importantly, by comparing the results in the 
Tables 4 and 5, we can see that EBL/DDB capabilities are able to bring about significant speedups 
even over a Graphplan implementation using FC and DVO. 

8. EBL/DDB & Randomized Search 

Recent years have seen increased use of randomized search strategies in planning. These include 
both purely local search strategies (Gerevini, 1999; Selman, Levesque, & Mitchell, 1992) as well 
as hybrid strategies that introduce a random restart scheme on top of a systematic search strategy 
(Gomes et al., 1998). The BLACKB OX planning system (Kautz & Selman, 1999) supports a variety 
of random restart strategies on top of a SAT compilation of the planning graph, and empirical 
studies show that these strategies can, probabilistically speaking, scale up much better than purely 
systematic search strategies. 

I wanted to investigate if (and by how much) EBL & DDB techniques will help Graphplan 
even in the presence of these newer search strategies. While EBL and DDB techniques have little 
applicability to purely local search strategies, they could in theory help random restart systematic 
search strategies. Random restart strategies are motivated by an attempt to exploit the "heavy- 
tail" distribution (Gomes et al, 1998) of the solution nodes in the search trees of many problems. 
Intuitively, in problems where there are a non-trivial percentage of very easy to find solutions as 
well as very hard to find solutions, it makes sense to restart the search when we find that we are 
spending too much effort for a solution. By restarting this way, we hope to (probabilistically) hit on 
the easier-to-find solutions. 

I implemented a random-restart strategy on top of Graphplan by making the following simple 
modifications to the backward search: 

1 . We keep track of the number of times the backward search backtracks from one level of the 
plan graph to a previous level (a level closer to the goal state), and whenever this number 
exceeds a given limit (called backtrack limit), the search is restarted (by going back to the last 
level of the plan graph), assuming that the number of restarts has not also exceeded the given 
limit. The search process between any two restarts is referred to as an epoch. 

2. The supporting actions (values) for a proposition variable are considered in a randomized 
order. It is this randomization that ensures that when the search is restarted, we will look at 
the values of each variable in a different order. 21 

Notice that random-restart strategy still allows the application of EBL and DDB strategies, since 
during any given epoch, the behavior of the search is identical to that of the standard backward 
search algorithm. Indeed, as the backtrack limit and the number of restarts are made larger and 
larger, the whole search becomes identical to standard backward search. 

21. Reordering values of a variable doesn't make a whole lot of sense in BLACKBOX which is based on SAT encodings 
and thus has only boolean variables. Thus, the randomization in BLACKBOX is done on the order in which the goals 
are considered for assignment. This typically tends to clash with the built-in goal ordering strategies (such as DVO 
and SAT-Z (Li & Anbulagan, 1997)), and they get around this conflict by breaking ties among variables randomly. 
To avoid such clashes, I decided to randomize Graphplan by reordering values of a variable. I also picked inter-level 
backtracks as a more natural parameter characterizing the difficulty of a problem for Graphplan's backward search. 



24 



Problem 


Parameters 
R/B/L 


Graphplan with EBL/DDB 


Normal Graphplan 


%sol 


Length 


Time 


Av. MFSL 


%sol 


Length 


Time 


Av. MFSL 


Att-log-a( 11/54) 


5/50/20 


99% 


14(82) 


.41 


4.6K(28K) 


2% 


19(103) 


.21 


.3K(3.7K) 


Att-log-a( 11/54) 


10/100/20 


100% 


11.3(69.5) 


.72 


17.8K(59K) 


11% 


17.6(100.5) 


1.29 


3.7K(41K) 


Att-log-a( 11/54) 


10/100/30 


100% 


11.3(69.5) 


.72 


17.8K(59K) 


54% 


25.6(136) 


3 


4K(78K) 


Att-log-a( 11/54) 


20/200/20 


100% 


11(68.5) 


2.38 


73K(220K) 


13% 


18(97.5) 


3 


31K(361K) 


Att-log-a( 11/54) 


20/200/30 


100% 


11(68.5) 


2.38 


73K(220K) 


94% 


22.1(119.3) 


31 


33K(489K) 


Att4og-b( 13/47) 


5/50/20 


17% 


18.1(101) 


1.62 


8K(93K) 


0% 






.2K(4K) 


Att4og-b( 13/47) 


10/100/20 


60% 


17.3(98) 


11.4 


69K(717K) 


0% 






2.6K(53K) 


Att4og-b( 13/47) 


10/100/30 


100% 


20.1(109) 


15.3 


74K(896K) 


3% 


28(156) 


4 


5K(111K) 


Att4og-c( 13/65) 


5/50/30 


55% 


22.85(124) 


2.77 


8K(145K) 


2% 


26.5(135) 


.75 


.4K(8K) 


Att4og-c( 13/65) 


10/100/30 


100% 


19.9(110) 


14 


71K(848K) 


2% 


29(152) 


4 


3.7K(111K) 


Rocket-ext-a(7/34) 


10/100/30 


100% 


7.76(35.8) 


1.3 


29K(109K) 


58% 


21.24(87.3) 


2 


.2K(4K) 


Rocket-ext-a(7/34) 


20/200/30 


100% 


7(34.1) 


1.32 


38K(115K) 


90% 


21.3(85) 


8.1 


2.3K(43K) 


Rocket-ext-a(7/34) 


40/400/30 


100% 


7(34.2) 


1.21 


35K(105K) 


100% 


15.3(62.5) 


45 


35K(403K) 



Table 6: Effect of EBL/DDB on random-restart Graphplan. Time is measured in cpu minutes on Allegro Common Lisp 5.0 running on a 
Linux 500MHZ Pentium machine. The numbers next to the problem names are the number of steps and actions in the shortest 
plans reported for those problems in the literature. The R/B/L parameters in the second column refer to the limits on the number 
of restarts, number of backtracks and the number levels to which the plan graph is expanded. All the statistics are averaged over 
multiple runs (typically 100 or 50). The "MFSL" column gives the average number of memo4jased failures per searched level of 
the plan graph. The numbers in parentheses are the total number of memo43ased failures averaged over all runs. Plan lengths were 
averaged only over the successful runs. 
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To check if my intuitions about the effectiveness of EBL/DDB in randomized search were in- 
deed correct, I conducted an empirical investigation comparing the performance of random search 
on standard Graphplan as well as Graphplan with EBL/DDB capabilities. Since the search is ran- 
domized, each problem is solved multiple number of times (100 times in most cases), and the run- 
time, plan length and other statistics were averaged over all the runs. The experiments are conducted 
with a given backtrack limit, a given restart limit, as well as a limit on the number of levels to which 
the planning graph is extended. This last one is needed as in randomized search, a solution may be 
missed at the first level it appears, leading to a prolonged extension of the planning graph until a 
(inoptimal) solution is found at a later level. When the limit on the number of levels is expanded, 
the probability of finding solution increases, but at the same time, the cpu time spent searching the 
graph also increases. 

Having implemented this random restart search, the first thing I noticed is an improvement in 
the solvability horizon (as expected, given the results in (Gomes et al., 1998)). Table 6 shows these 
results. One important point to note is that the results in the table above talk about average plan 
lengths and cpu times. This is needed as due to randomization potentially each ran can produce 
a different outcome (plan). Secondly, while Graphplan with systematic search guarantees shortest 
plans (measured in the number of steps), the randomized search will not have such a guarantee. 
In particular, the randomized version might consider a particular planning graph to be barren of 
solutions, based simply on the fact that no solution could be found within the confines of the given 
backtrack limit and number of restarts. 

Graphplan, with or without EBL/DDB, is more likely to solve larger problems with randomized 
search strategies. For example, in the logistics domain, only the Att-log-a problem was solvable 
(within 24 hours real time) with EBL and systematic search. With the randomization added, my 
implementation was able to solve both Att-log-b and Att-log-c quite frequently. As the limits on the 
number of restarts, backtracks and levels is increased, the likelihood of finding a solution as well as 
the average length of the solution found improves. For example, Graphplan with EBL/DDB is able 
to solve Att-log-b in every trial for 10 restarts, 100 backtracks and 30 levels as the limits (although 
the plans are quite inoptimal). 

The next, and perhaps more interesting, question I wanted to investigate is whether EBL and 
DDB will continue to be useful for Graphplan when it uses randomized search. At first blush, 
it seems as if they will not be as important-after all even Graphplan with standard search may 
luck out and be able to find solutions quickly in the presence of randomization. Further thought 
however suggests that EBL and DDB may still be able to help Graphplan. Specifically, they can 
help Graphplan in using the given backtrack limit in a more judicious fashion. To elaborate, suppose 
the random restart search is being conducted with 100 backtracks and 10 restarts. With EBL and 
DDB, Graphplan is able to pinpoint the cause of the failure more accurately than without EBL and 
DDB. This means that when the search backtracks, the chance that it will have to backtrack again 
for the same (or similar) reasons is reduced. This in turn gives the search more of a chance on 
catching a success during one of the number of epochs allowed. All this is in addition to the more 
direct benefit of being able to use the stored memos across epochs to cut down search. 

As can be seen from the data in Table 6, for a given set of limits on number of restarts, number 
of backtracks, and number of levels expanded, Graphplan with EBL/DDB is able to get a higher 
percentage of solvability as well as significantly shorter length solutions (both in terms of levels and 
in terms of actions). To get comparable results on the standard Graphplan, I had to significantly 
increase the input parameters (restarts, backtracks and levels expanded), which in turn led to dra- 
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matic increases in the average ran time. For example, for the Att-log-a problem, with 5 restarts and 
50 backtracks, and 20 levels limit, Graphplan was able to solve the problem 99% of the time, with 
an average plan length of 14 steps and 82 actions. In contrast, without EBL/DDB, Graphplan was 
able to solve the problem in only 2% of the cases, with an average plan length of 19 steps and 103 
actions. If we double the restarts and backtracks, the EBL/DDB version goes to 100% solvability 
with an average plan length of 11.33 steps and 69.53 actions. The standard Graphplan goes to 11% 
solvability and a plan length of 17.6 steps and 100 actions. If we increase the number of levels to 30, 
then the standard Graphplan solves 54% of the problems with an average plan length of 25.6 steps 
and 136 actions. It takes 20 restarts and 200 backtracks, as well as a 30-level limit before standard 
Graphplan is able to cross 90% solvability. By this time, the average ran time is 31 minutes, and 
the average plan length is 22 steps and 1 19 actions. The contrast between this and the 99% solv- 
ability in 0.4 minutes with 14 step 82 action plans provided by Graphplan with EBL and 5 restarts 
and 50 backtracks is significant! Similar results were observed in other problems, both in logistics 
(Att-log-b, Att-log-c) and other domains (Rocket-ext-a, Rocket-ext-b). 

The results also show that Graphplan with EBL/DDB is able to generate and reuse memos ef- 
fectively across different restart epochs. Specifically, the numbers in the columns titled "Av. MFSL" 
give the average number of memo-based /ailures per search Zevel. 22 We note that in all cases, the 
average number of memo-based failures are significantly higher for Graphplan with EBL than for 
normal Graphplan. This shows that EBL/DDB analysis is helping Graphplan reduce wasted effort 
significantly, and thus reap better benefits out of the given backtrack and restart limits. 

9. Related Work 

In their original implementation of Graphplan, Blum and Furst experimented with a variation of the 
memoization strategy called "subset memoization". In this strategy, they keep the memo generation 
techniques the same, but change the way memos are used, declaring a failure when a stored memo 
is found to be a subset of the current goal set. Since complete subset checking is costly, they 
experimented with a "partial" subset memoization where only subsets of length n and n — 1 are 
considered for an n sized goal set. 

As we mentioned earlier, Koehler and her co-workers (Koehler et al., 1997) have re-visited the 
subset memoization strategy, and developed a more effective solution to complete subset checking 
that involves storing the memos in a data structure called UB-Tree, instead of in hash tables. The 
results from their experiments with subset memoization are mixed, indicating that subset memoiza- 
tion does not seem to improve the cpu time performance significantly. The reason for this is quite 
easy to understand - while they improved the memo checking time with the UB-Tree data structure, 
they are still generating and storing the same old long memos. In contrast, the EBL/DDB extension 
described here supports dependency directed backtracking, and by reducing the average length of 
stored memos, increases their utility significantly, thus offering dramatic speedups. 

To verify that the main source of power in the EBL/DDB-Graphplan is in the EBL/DDB part 
and not in the UB-Tree based memo checking, I re-ran my experiments with EBL/DDB turned off, 

22. Notice that the number of search levels may be different from (and smaller than) the number of planning graph levels, 
because Graphplan initiates a search only when none of the goals are pair-wise mutex with each other. In Att-log-a, 
Att-log-b and Att-log-c, this happens starting at level 9. For Rocket-ext-a it happens starting at level 5. The numbers 
in parentheses are the total number of memo based failures. We divide this number by the average number of levels 
in which search was conducted to get the "Av. MFSL" statistic. 
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Table 7: Performance of subset memoization with UB-Tree data structure (without EBL/DDB). The 
"Tt" is the total cpu time and "Mt" is the time taken for checking memos. "#Btks" is the 
number of backtracks. "EBLx" is the amount of speedup offered by EBL/DDB over subset 
memoization "#Gen" lists the number of memos generated (and stored), "#Fail" lists the 
number of memo-based failures, "AvFM" is the average number of failures identified per 
generated memo and "AvLn" is the average length of stored memos. 



but with subset memo checking with UB-Tree data structure still enabled. The results are shown in 
in Table 7. The columns labeled "AvFM" show that as expected subset memoization does improve 
the utility of stored memos over normal Graphplan (since it uses a memo in more scenarios than 
normal Graphplan can). However, we also note that subset memoization by itself does not have any 
dramatic impact on the performance of Graphplan, and that EBL/DDB capability can significantly 
enhance the savings offered by subset memoization. 

In (Kambhampati, 1998), I describe the general principles underlying the EBL/DDB techniques 
and sketch how they can be extended to dynamic constraint satisfaction problems. The development 
in this paper can be seen as an application of the ideas there. Readers needing more background 
on EBL/DDB are thus encouraged to review that paper. Other related work includes previous at- 
tempts at applying EBL/DDB to planning algorithms, such as the work on UCPOP+EBL system 
(Kambhampati et al., 1997). One interesting contrast is the ease with which EBL/DDB can be added 
to Graphplan as compared to UCPOP system. Part of the difference comes from the fact that the 
search in Graphplan is ultimately on a propositional dynamic CSP, while in UCPOP's search is a 
variablized problem-solving search. 

As I mentioned in Section 2, Graphplan planning graph can also be compiled into a normal CSP 
representation, rather than the dynamic CSP representation. I used the dynamic CSP representa- 
tion as it corresponds quite directly to the backward search used by Graphplan. We saw that the 
model provides a clearer picture of the mutex propagation and memoization strategies, and helps us 
unearth some of the sources of strength in the Graphplan memoization strategy-including the fact 
that memos are a very conservative form of no-good learning that obviate the need for the no-good 
management strategies to a large extent. 

The dynamic CSP model may also account for some of the peculiarities of the results of my 
empirical studies. For example, it is widely believed in the CSP literature that forward checking and 
dynamic variable ordering are either as critical as, or perhaps even more critical than, the EBL/DDB 
strategies (Bacchus & van Run, 1995; Frost & Dechter, 1994). Our results however show that for 
Graphplan, which uses the dynamic CSP model of search, DVO and FC are largely ineffective 
compared to EBL/DDB on the standard Graphplan. To some extent, this may be due to the fact that 
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Graphplan already has a primitive form of EBL built into its memoization strategy. In fact, Blum 
& Furst (1997) argue that with memoization and a minimal action set selection (an action set is 
considered minimal if it is not possible to remove an action from the set and still support all the 
goals for which the actions were selected), the ordering of goals will have little effect (especially in 
the earlier levels that do not contain a solution). 

Another reason for the ineffectiveness of the dynamic variable ordering heuristic may have to 
do with the differences between the CSP and DCSP problems. In DCSP, the main aim is not just to 
quickly find an assignment for the the current level variables, but rather to find an assignment for 
the current level which is likely to activate fewer and easier to assign variables, whose assignment 
in turn leads to fewer and easier to assign variables and so on. The general heuristic of picking the 
variable with the smallest (live) domain does not necessarily make sense in DCSP, since a variable 
with two actions supporting it may actually be much harder to handle than another with many 
actions supporting it, if each of the actions supporting the first one eventually lead to activation of 
many more and harder to assign new variables. It may thus be worth considering ordering strategies 
that are more customized to the dynamic CSP models-e.g. orderings that are based on the number 
(and difficulty) of variables that get activated by a given variable (or value) choice. 

We have recently experimented with a value-ordering heuristic that picks the value to be as- 
signed to a variable using the distance estimates of the variables that will be activated by that choice 
(Kambhampati & Nigenda, 2000). The planning graph provides a variety of ways of obtaining these 
distance estimates. The simplest idea would be to say that the distance of a proposition p is the level 
at which p enters the planning graph for the first time. This distance estimate can then be used 
to rank variables and their values. Variables can be ranked simply in terms of their distances-the 
variables that have the highest distance are chosen first (akin to fail-first principle). Value ordering 
is a bit trickier-for a given variable, we need to pick an action whose precondition set has the lowest 
distance. The distance of the precondition set can be computed from the distance of the individual 
preconditions in several ways: 

• Maximum of the distances of the individual propositions making up the preconditions. 

• Sum of the distances of the individual propositions making up the preconditions. 

• The first level at which the set of propositions making up the preconditions are present and 
are non-mutex. 

In (Kambhampati & Nigenda, 2000), we evaluate goal and value ordering strategies based on 
these ideas, and show that they can lead to quite impressive (upto 4 orders of magnitude in our 
tests) speedups in solution-bearing planning graphs. We also relate the distances computed through 
planning graph to the distance transforms computed by planners like HSP (Bonet, Loerincs, & 
Geffner, 1999) and UNPOP (McDermott, 1999). This idea of using the planning graph as a basis 
for computing heuristic distance metrics is further investigated in the context of state-space search 
in (Nguyen & Kambhampati, 2000). An interesting finding in that paper is that even when one is 
using state-space instead of CSP-style solution extraction, EBL can still be useful as a lazy demand- 
driven approach for discovering n-ary mutexes that can improve the informedness of the heuristic. 
Specifically, Long & Kambhampati describe a method where a limited run of Graphplan's back- 
ward search, armed with EBL/DDB is used as a pre-processing stage to explicate memos ("n-ary 
mutexes") which are then used to significantly improve the effectiveness of the heuristic on the 
state-search. 



29 



Kambhampati 



The general importance of EBL & DDB for CSP and SAT problems is well recognized. Indeed, 
one of the best systematic solvers for propositional satisfiability problems is RELSAT (Bayardo & 
Schrag, 1997), which uses EBL, DDB, and forward checking. A randomized version of RELSAT is 
one of the solvers supported by the BLACKBOX system (Kautz & Selman, 1999), which compiles 
the planning graph into a SAT encoding, and ships it to various solvers. BLACKBOX thus offers 
a way of indirectly comparing the Dynamic CSP and static CSP models for solving the planning 
graph. As discussed in Section 2.2, the main differences are that BLACKBOX needs to compile 
the planning graph into an extensional SAT representation. This makes it harder for BLACKBOX 
to exploit the results of searches in previous levels (as Graphplan does with its stored memos), 
and also leads to memory blowups. The latter is particularly problematic as the techniques for 
condensing planning graphs, such as the bi-level representation discussed in (Fox & Long, 1999; 
Smith & Weld, 1999) will not be effective when we compile the planning graph to SAT. On the 
flip side, BLACKBOX allows non-directional search, and the opportunity to exploit existing SAT 
solvers, rather than develop customized solvers for the planning graph. At present, it is not clear 
whether either of these approaches dominates the other. In my own informal experiments, I found 
that certain problems, such as Att-log-x, are easier to solve with non-directional search offered by 
BLACKBOX, while others, such as Gripper-x, are easier to solve with the Graphplan backward 
search. The results of the recent AIPS planning competition are also inconclusive in this respect 
(McDermott, 1998). 

While my main rationale for focusing on dynamic CSP model of the planning graph is due to 
its closeness to Graphplan's backward search, Gelle (1998) argues that keeping activity constraints 
distinct from value constraints has several advantages in terms of modularity of the representation. 
In Graphplan, this advantage becomes apparent when not all activation constraints are known a 
priori, but are posted dynamically during search,. This is the case in several extensions of the 
Graphplan algorithm that handle conditional effects (Kambhampati et al., 1997; Anderson, Smith, 
& Weld, 1998; Koehler et al, 1997), and incomplete initial states (Weld, Anderson, & Smith, 1998). 

Although EBL and DDB strategies try to exploit the symmetry in the search space to improve the 
search performance, they do not go far enough in many cases. For example, in the Gripper domain, 
the real difficulty is that search gets lost in the combinatorics of deciding which hand should be used 
to pick which ball for transfer into the next room-a decision which is completely irrelevant for the 
quality of the solution (or the search failures, for that matter). While EBL/DDB allow Graphplan 
to cut the search down a bit, allowing transfer of up to 10 balls from one room to another, they 
are over come beyond 10 balls. There are two possible ways of scaling further. The first is to 
"variablize" memos, and realize that certain types of failures would have occurred irrespective of 
the actual identity of the hand that is used. Variablization, also called "generalization" is a part 
of EBL methods (Kambhampati, 1998; Kambhampati et al., 1997). Another way of scaling up 
in such situations would be to recognize the symmetry inherent in the problem and abstract the 
resources from the search. In (Srivastava & Kambhampati, 1999), we describe this type of resource 
abstraction approach for Graphplan. 

10. Conclusion and Future work 

In this paper, I traced the connections between the Graphplan planning graph and CSP, and mo- 
tivated the need for exploiting CSP techniques to improve the performance of Graphplan back- 
ward search. I then adapted and evaluated several CSP search techniques in the contest of Graph- 
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plan. These included EBL, DDB, forward checking, dynamic variable ordering, sticky values, and 
random-restart search. My empirical studies show the EBL/DDB is particularly useful in dramati- 
cally speeding up Graphplan's backward search (by up tp lOOOx in some instances). The speedups 
can be improved further (by up to 8x) with the addition of forward checking, dynamic variable or- 
dering and sticky values on top of EBL/DDB. I also showed that EBL/DDB techniques are equally 
effective in helping Graphplan, even if random-restart search strategies are used. 

A secondary contribution of this paper is a clear description of the connections between the 
Graphplan planning graph, and the (dynamic) constraint satisfaction problem. These connections 
help us understand some unique properties of the Graphplan memoization strategy, when viewed 
from CSP standpoint (see Section 9). 

There are several possible ways of extending this work. The first would be to support the 
use of learned memos across problems (or when the specification of a problem changes, as is the 
case during replanning). Blum & Furst (1997) suggest this as a promising future direction, and 
the EBL framework described here makes the extension feasible. As discussed in (Kambhampati, 
1998; Schiex & Verfaillie, 1993), supporting such inter-problem usage involves "contextualizing" 
the learned no-goods. In particular, since the soundness of memos depends only on the initial state 
of the problem (given that operators do not change from problem to problem), inter-problem usage 
of memos can be supported by tagging each learned memo with the specific initial state literals that 
supported that memo. Memos can then be used at the corresponding level of a new problem as 
long as their initial state justification holds in the new problem. The initial state justification for 
the memos can be computed incrementally by a procedure that first justifies the propagated mutex 
relations in terms of the initial state, and then justifies individual memos in terms of the justifications 
of the mutexes and other memos from which they are derived. 

The success of EBL/DDB approaches in Graphplan is in part due to the high degree of re- 
dundancy in the planning graph structure. For example, the propositions (actions) at level / in a 
planning graph are a superset of the propositions (actions) at level I — 1, the mutexes (memos) at 
level I are a subset of the mutexes (memos) at level I — 1). While the EBL/DDB techniques help 
Graphplan exploit some of this redundancy by avoiding previous failures, the exploitation of redun- 
dancy can be pushed further. Indeed, the search that Graphplan does on a planning graph of size I 
is almost a re-play of the search it did on the planning graph of size I — 1 (with a few additional 
choices). In (Zimmerman & Kambhampati, 1999), we present a complementary technique called 
"explanation-guided backward search" that attempts to exploit this deja vu property of the Graph- 
plan's backward search. Our technique involves keeping track of an elaborate trace of the search at 
a level I (along with the failure information), termed the "pilot explanation" for level and using the 
pilot explanation to guide the search at level I — 1. The way EBL/DDB help in this process is that 
they significantly reduce the size of the pilot explanations that need to be maintained. Preliminary 
results with this technique shows that it complements EBL/DDB and provides significant further 
savings in search. 
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